Project
Title
Description
Hardware
Software
Electrical component themed AI detection and identification.
A mounted camera above a surface (part of the product).
Produces a controlled environment live feed for the application.
An application running inference on a live USB camera feed (optionally imported
picture or video)
Application
Modification of the provided data to simulate differences in the environment that
would occur in a real life scenario, and to provide a challenge in form of
imperfections to both train against, and test against.
Augmentation examples
Addition of glare
Rotation
Blurring
Addition of spots
GUI
The GUI Application is based on Qt Creator, using C++
Inference
Running on C++
Utilising
Summary
Detection of objects from an image via Inference
Detect and display boundaries for each identified class, and the confidence value of
this detection from the input image using Inference.
Identification
Post-processing of the components in the bounding boxes detected by inference,
which may have additional information that can be identified by a variety of
approaches.
Examples
LEDs
Resistors
Resistor code value
LED color
Technology
AI based Electronic Component Identifier
IC Components
Pin count
Information written on the component
Features
Inference
Classes present in the dataset that the model will be trained upon
Resistor
Diode
Capacitor
LEDs
Integrated Circuits
AC
DC
LDR
Milestones
Base camera rig
Initial inference model training
Inference running
Testing with video footage from a mobile device
Research
Pre-Trained Models
Ultralytics YOLO
Focus Audience
Set Rig
The set position of the camera, a significant reduction in distance between the
objects, and significant consistency of the lighting provided by the ring light, and the
static background - will boost the confidence of the inference considerably.
Training
Running
Post-processing
Rationale
Timeline
Gantt Chart
Live training
YOLOv5
Training
1st batch, test run
Image Count
100
Classes (1 Total)
Resistor
2nd batch
Image Count
Training
Evaluation
20
Training
Evaluation
1800
540
3rd batch
Image Count
Training
Evaluation
2393
724
Classes (9 Total)
red_led
green_led
blue_led
yellow_led
ac_capacitor
dc_capacitor
resistor
sip_resistor
pcb_terminal
Classes (10 Total)
red_led
green_led
blue_led
yellow_led
ac_capacitor
dc_capacitor
resistor
sip_resistor
pcb_terminal
metal_nut
Augmentation
Default
Augmentation
Default
Average time per epoch
34 seconds
Epoch count
400
Epoch count
250
Epoch count
300
Augmentation
Default
YOLOv8
Due to the angle and lighting both being known and mostly set thanks to using a set
rig, the input dataset does not need to cover angles and lighting outside what the rig
will expose it to during runtime.
The sum of all the points covered above results in a significant reduction in data
required to train when compared to a setup without a set rig, for equivalent
confidence values during runtime.
The angle range is reduced only to looking from top to down, eliminating the rest of
the angle range.
While the lighting will change depending on the room conditions, the ring light
around the camera will provide significant consistency in lighting.
While this does not not eliminate the necessity to train against various lighting
conditions, it does reduce their significance and increase certainty of the detection.
Only the components being detected need to be trained in all angles, as opposed to
the camera gathering the dataset requiring to be positioned in different angles.
Having a top to down view also eliminates the majority of issues that come with
glare from high luminosity bodies, such as clouds or the sun.
A set rig significantly limits the distance that the objects will be from the camera
during runtime, allowing for further confidence in the predictions.
Static background
Angle range
Lighting
Apart from dust or unexpected objects present on the rig's surface, which should be
removed before usage - the background that the objects are in front of will stay
mostly consistent.
This reduces the necessity to gather data of the same object under backgrounds
that are not expected to be used during runtime.
While this project may be retrained and refocused to be utilised for many different
fields - it is trained for electrical component identification, which is focused towards
engineers.
Architectures
This project focuses on both existing engineers, and ones that are interested in
becoming engineers.
Having access to the provided by the project quick identification of components,
count of each, and any potential additional information saves time spent manually
analysing this information.
Average time per epoch
2 minutes
Average time per epoch
2 minutes and 20 seconds
SIP Resistor
Singular
Acronyms
SIP - Single Inline Package
Introduction
GPU - Graphics Processing Unit
CPU - Central Processing Unit
AI - Artificial Intelligence
LDR - Light Dependant Resistor
LED - Light Emitting Diode
AC - Alternating Current
DC - Direct Current
PCB terminal
PCB - Printed Circuit Board
The most prominent color may be identified by sorting all the colors from the image
into their hue values, and checking which hue is most active.
The color codes can be identified by processing the image using filters and
otherwise until only the prominent colors remain.
These can be processed into the actual ohm value.
Then, the positions of the color codes relative to the body of the resistor can be used
to identify the specific positions and order of the color codes.
The pin count can be identified by processing the image using filters until there is a
clear contrast between the body of the chip, and the pins.
One approach that could help identify the number of pins would be drawing a line
between two of the pins and seeing how many of the pins touch this line. Taking the
line that touches the most pins would provide the pin count of this IC.
OCR may be used to extract the text based information.
OCR - Optical Character Recognition
Software based reading alphabetical characters from an image that contains written
text.
Input image
Inference method
Algorithm method
Different color LEDs may be trained as individual classes.
Has the disadvantage of requiring training for each individual LED separately, as
opposed to one generic LED.
Has the advantage of working on any LED.
Raw input
High contrast filter
Colors histogram
Approaches after filtering
Results show clearly prominent yellow, which is accurate.
Has the disadvantage of potentially giving false information if the background is too
vibrant.
Contrast approach
HSV
Taking the average of all the pixels hue values that have a value above a certain
threshold. Around 0.7 on a range from 0 to 1 should be appropriate.
Hue is in the range of 0 to 360 degrees.
The pink dots represent the pixel values obtained from the previous step.
Taking the average of this data, the result would land in the degree value that can be
easily determined as yellow, by separating the hue circle into sections of colors by
degrees ranges.
HSV, or Hue Saturation Value, is an alternative way to represent colors.
It can be advantageous over RGB in situations such as this.
RGB - Red Green Blue
Commonly used to referred to a way of defining colors by their Red, Green and Blue
properties.
HSV - Hue Saturation Value
Commonly used to referred to a way of defining colors by their Hue, Saturation and
Value properties.
Yellow is between 72° and 108° degrees on the hue circle.
Note: This example would ignore colors that are darker than 0.7, on a range of 0 to
1.
Sorted from highest priority, to lowest.
Setting up the camera on a rig.
Base GUI
GUI with essentials to interface the camera through USB, with
A live display from the camera on the rig.
Ability to export images by pressing a button.
Support for running Inference.
~100 images of a single class, taken from the rig for initial training and testing of the
model.
Initial dataset gathering
For the purpose of testing inference on the rig.
Proof of concept. The results will not be perfect as the dataset is minimal, and only
contains 1 class.
Further dataset gathering
At least 250 pictures of each class of every component that the project is designed
to detect.
Further model training
This training will take considerably longer than the initial training. Around 2 minutes
per epoch, and should be ran for at least 300 epochs.
The initial training should not take long at all, and does not require to be polished.
Training for ~100 epochs should be sufficient, with each epoch taking ~20 seconds
on the machine available.
Rig
Model Training via Deep Learning
Machine used
Personal Computer
CPU
GPU
AMD RyzenTM 7 5800X3D
Core count
8
Base clock frequency
3.4GHz
L3 Cache
96MB
Maximum operating temperature
90°C
Thread count
16
GeForce RTX 3060 Ti
Memory
8192MB
CUDA core count
4864
Capacity
Type
GDDR6X
Base clock frequency
1.41GHz
The goal is to reach 0.8 from range of 0 to 1 confidence values.
Ability to gather further information from the detection bounding boxes provided by
the inference.
After the previous steps are in good shape, investigation of moving the inference to
a mobile device will begin.
If the confidence values are not up to standard, more data will be gathered from this
and potentially other mobile devices, and further training will follow, until the results
are adequate.
If adequate results are achieved before the deadline of this project, deployment to a
mobile device will be started.
If the frame rates are not sufficient enough, the inference may be ran on still images
to improve user experience.
Optional: Ability to label the images from the device, without requiring external
software.
It may be advantageous given the timeframe of the project to instead gather data
during a session and labelling it afterwards.
Memory
Capacity
Type
2x16GB
DDR4
Frequency
3.6GHz
Brand
Corsair
Name
Vengeance RGB PRO SL
Link
https://www.corsair.com/ww/en/Categories/Products/Memory/Vengeance-RGB-
PRO-SL-White/p/CMH16GX4M2E3200C16W
Brand
AMD
Name
Ryzen 7 58700X3D
Link
https://www.amd.com/en/products/cpu/amd-ryzen-7-5800x3d
Brand
NVIDIA
Series
30
Name
RTX 3060Ti
Link
https://www.nvidia.com/en-gb/geforce/graphics-cards/30-series/rtx-3060-3060ti/
CUDA
Special cores that are designed for compute-intensive tasks.
These run parallel with the CPU, and may also run parallel with multiple GPUs.
They are perfect for deep learning, as deep learning is incredibly compute intensive.
Deep learning training times are predictable, and stay mostly constant between
epochs.
This means that there are no race conditions, and the more processing power
available, the quicker the epoch will finish.
Each of these steps should be polished before continuing to the next one, to provide
a solid foundation for the next step to be based on.
Analysis of the models
Brief History
YOLO, which stands for You Only Look Once is a popular image segmentation and
object detection model that was originally developed by Joseph Redmon and Ali
Farhadi.
The first version was released in 2015, and it very quickly became popular due to
the significantly superior speed and accuracy when compared to other architectures.
YOLOv1
YOLOv4
Released in 2018, Introducting of Mosaic data augmentation, and a new and
improved loss function - decreasing time taken to achieve better results for the
trained model.
YOLOv5
Released in 2020, Introducing support for Object Tracking - which allows following a
moving object, and Panoptic Segmentation, which allows identification of
overlapping objects, with accurate bounding boxes.
Ultralytics YOLOv8
The latest version of YOLO as of today. YOLOv8 is a state-of-the-art model that
builds upon the already very successful previous YOLO versions, introducing new
performance and flexibility features.
Full support for previous YOLO versions, making it incredibly convenient for existing
users of previous YOLO versions to take advantage of the new features.
Versions
Comparison
In general, YOLOv8 is superior to all of its predecessors.
While YOLOv5 is mostly underperforming when compared to the next versions, it is
important to note how incredibly minimal the delays are even on a version so
outdated now.
YOLO offers pre-trained models that are used to start train custom models.
Each model has its advantages and disadvantages, and should be picked
depending on the project.
Size
mAP single-model single-scale values while detecting on the COCO val2017
dataset.
Speed
Averaged time taken using the Amazon EC2 P4d instance on the COCO dataset.
The pixel height and width the model operates up to.
Params (In Millions)
The number of parameters that are tweaked per epoch while training, and
processed during inference.
FLOPS
Floating Point Operations Per Second
A measure based on Floating Point Operations that is relevant in the field of Deep
Learning.
Diminishing results can be observed on the mAP values when compared to the time
taken (Speed).
Model properties
In some circumstances, maximum precision is essential, and is prioritised over the
hardware requirements. This is when a higher model should be chosen.
In the scope of this project - several pre-trained models have been used, including
both YOLOv5 and YOLOv8, for the purpose of comparison.
In comparison of YOLOv5 and YOLOv8 versions - a clear advantage can be seen
when taking into account the size of the model (param count), and the resulting mAP
output, as well as the time taken.
Architecture choice
YOLO has been chosen as the architecture that this project utilises for the AI
detection.
At the start of the project, there was already a high bias towards YOLO due to the
highly positive personal past experience with YOLOv5 and all the incredible features
that it offers.
Upon release of YOLOv8 and all the superior features and specifications that it
provides on top of the previous versions which this project may expand into - the
YOLO family was an obvious choice in the architecture that will be used for the
project.
As the name suggests, YOLO focuses on detection of multiple classes in a single
"look", which is a single analysis of the entire input image.
When compared to many other architectures before YOLO, realistically, no matter
how quick the other architectures may be - this is an incredibly superior approach,
as other architectures would approach detection by reanalysing the entire image for
every single class that the model was trained for - increasing the time taken per
detection additively per class.
An approach like this may seem too good to be true, and that it should come with
signficant cost to the speed and confidence of the model.
But when the results are analysed - that could barely be further from the truth.
YOLO is an incredibly efficient and accurate architecture.
These days most sophisticated architectures approach object detection similarly to
YOLO, but YOLO is still a state-of-the-art architecture that continues to improve and
grow to this day.
Internal AI Object Detection steps
Classification
Object Detection
Segmentation
The process of identifying the exact bounding box of the item detected.
The Bounding by a box of the classified segments of the image.
The identification of a part of an image believe to contain an item of a class the
model was trained to detect.
Visual examples
Resizing
Joining up of multiple images to create new ones
The reduction of data required to train makes it feasible to train relatively high quality
models from data gathered and trained from home.
Marking Codes
Hardware
Raspberry Pi 4
Beaglebone
Nvidia Jetson Nano
Intel Neural Compute Stick 2
Specifications
Processor Base Frequency
700MHz
Memory
2GB
Specifications
Core Count (SHAVE)
16
Advantage
Offers computational power through a USB connection - can be used to run
Inference on existing devices, such as a laptop.
Specifications
Specifications
Core Count
4
Maximum Frequency
700MHz
Resistors and Inductors
Capacitors
ICs
Color coded
Number coded
Android Phone
Specifications depend on the specific device.
Widely and easily accessible.
The vast majority of mobile phones on the market today have a built-in camera.
YOLO - You Only Look Once
An image detection architecture that the project is based on.
CUDA cores provided by the GPU
CPU
Inference
Training
Personal Computer
Rented Dedicated Server
Advantages
Disadvantages
Advantages
Disadvantages
Local - Provided a local machine is already owned, it is immediately available.
Utilises multiple GPUs - Quicker epoch computations, resulting in quicker training.
Cloud based
Allows for parallel computing, as opposed to using your personal computer at home,
which will slow down any work required to be done on the computer.
Cloud based - upload and download times
Datasets tend to be considerably big in size.
A smaller dataset of ~2000 images takes up ~3gb of space.
This is not a significant amount of data for a local machine to transfer, but it is a
considerable amount for uploading.
Cost
The bigger the server - the higher the rates become.
Cost
As opposed to a rented server - acquiring your own machine has the benefit of
owning the machine, and being able to use it indefinitely (Or until it eventually
breaks.)
While the initial cost of acquiring an adequate machine for deep learning is higher
than renting a server for a few months, it is a worthwhile long-term investment into a
machine that can be used for a variety of casual or intensive tasks.
Setup time
Setup time
Speed
Speed
When compared to a sophisticated server that runs many GPUs - a local machine
will most likely process the training at a slower rate than a dedicated server would.
A local machine will likely contain one, maybe two GPUs.
Pictures are taken from the machine itself. No upload/download times.
Devices
Discussion
After the training is done, which is usually over the span of 10's, and sometimes
100's of hours, depending on the size of the dataset and the epoch count - Running
the trained model for inference only takes time in the range of milliseconds to
process a single frame.
R-CNN
Description
Disadvantages
Not real-time.
On average, takes 47 seconds to process a single frame.
Discussion
It should be noted that R-CNN has a successor called Fast R-CNN and Faster R-
CNN.
However, even the fastest of the successors still barely manage 5 frames a second
at best.
R-CNN, which stands for Region Based Convolutional Neural Networks was
released in 2013. As other object detection architectures, R-CNN takes an input
image, and outlines bounding boxes where it believes an item of a certain class is
present.
While 5 frames a second is an impressive and definitely useable result, there are
alternative architectures that offer a significant improvement in inference time.
Developed by Ross Girshick
SSD
Description
SSD, which stands for Single Shot Detector. SSD was released in 2017
Developed mostly by Max deGroot and Ellis Brown
Discussion
Offers great framerates of an average of 45 frames per second when tested on a
relatively old now graphics card, the NVIDIA GTX 1060.
Disadvantages
According to the Git repository, the project was seemingly abandoned about 4 years
ago.
According to the Git repository, the project was seemingly abandoned about 5 years
ago.
Discussion
The component of a computer where the core computations are processed.
An optional component of a computer that is dedicated and optimised in computing
graphical tasks.
Existing labelling related software offers quality of life features, such as rough auto
labelling of the images, which only requires the user to adjust the bounding boxes
and confirm their validity, rather than having to define the boxes from start to finish.
Inference Example
Inference Example
Observations
Surprisingly good results for a model trained from 120 images, with confidence
values above 0.8 and sometimes over 0.9!
Pre-trained model used
yolov5s
Architecture
YOLOv5
Architecture
Pre-trained model used
yolov5m
YOLOv5
Architecture
Pre-trained model used
yolov5m
YOLOv5
Observations
Rather poor results. Confidence values usually below 0.7, struggled to classify
accurately.
Observations
Great results with confidence values consistently above 0.8, classifying all classes
accurately!
Technology utilised
Deep learning computation with CPU Cores and GPU CUDA Cores running in
parallel.
220Ohm resistor example
Color codes
Red = 2
Brown = 1
Gold = 5% tolerance
100nF capacitor example
Unfortunately for the purposes of automatic identification of Integrated Circuit
markings, most IC manufacturers do not follow any global standard for marking their
ICs.
Most manufacturers tend to have their own internal IC marking standards.
Due to this fact - only known markings can be used to identify components.
Mixed manufacturer ICs example
This example illustrates the vast variation and lack of identifiable without access to
datasheets markings.
YOLOv5
Architecture
A silicon board that has parts of it etched away, with only conductive tracks
remaining in specific positions that are pre-planned using a CAD software.
Widely used to implement electronic circuits.
CAD - Computer Aided Design
CAD software accelerates and automates designs in various different fields.
Instructions are given to a computer that are translated into more complex and
intuitive, usually GUI based interactive programs.
Electrical current that oscillates.
Electrical current that stays constant.
An electrical component that emits light when current is passed through the circuit.
A resistor that varies in resistance relatively to the amount of light the body of the
component is exposed to.
A ring light has been introduced for both training and inference running.
Progress
Issues encountered
A glitch in augmentation provided by YOLOv5, where rotation during augmentation has
shifted the bounding boxes of the components, causing inaccurate feedback to the model,
preventing it from training appropriately.
Actual bounding boxes after rotation augmentation
Note the unnecessarily expanded bounding boxes.
Description
Submitted GitHub issue
Link
https://github.com/ultralytics/yolov5/issues/10639
Information gathered from replies as of todays date
This issue has been reported to be part of YOLOv7 augmentation also.
Example
Expected bounding boxes after rotation augmentation
Note the snug fit of the bounding box around the edges of the component.
That is desirable, as it provides accurate information on what the model should be looking
for.
This will train the model in undesirable ways, detecting parts it should not.
Augmentation rotation issue
Software
Description
Augmentation
Training
Description
Training versus Evaluation
Labels
Windows/Linux/Mac Desktop/Laptop Machine
Discussion
Discussion
Specifications depend on the specific device.
The specs of a desktop/laptop machine will most likely beat the specs of both a
phone, and a microprocessor.
Desktops are widely accessible in environments where it would be relevant to use
this project, such as the home of the user, or the campus a student is in.
Ease of access
Ease of access
Most people own a mobile device, and have it on them in most cases.
Ease of access
Due to the device being specialised for neural computations, it is not a common
device by any means.
Combined with the price tag of ~100 eur, this device will likely only be owned by
developers, as opposed to users.
As this device is unlikely to be owned by a user of the project, it would not be wise to
require owning one to run our inference.
Discussion
The project will be able to support a compute stick as an alternative to a GPU.
The machine must have permissions for USB connections and running the
application.
The app may be obtained from an App Store, that mobile devices have easy access
to, as long as they have access to the internet.
Specifications
Specifications
There are countless types of android devices on the market, all with varying
specifications.
Camera
Platforms
The application is designed through Qt Creator.
Qt Creator is cross-platform.
Cross-Platform
Microprocessors
USB computation extensions
How long specifically is directly tied to the speed of hardware that the model is being
ran on, and the size of the model.
Even with all the speed optimisations offered by the YOLO family, a lower end
device such as a Raspberry Pi 4 may take 1-2 seconds to process a single 360p
image.
It is important to pick appropriate hardware for your particular use cases.
CPU
Core Count
4
GPU
Maximum Frequency
1.5GHz
CPU
Core Count
1
Maximum Frequency
1GHz
Core Count
Maximum Frequency
GPU
2
532MHz
CPU
GPU
Core Count
4
Maximum Frequency
1.479GHz
Core Count
Maximum Frequency
128
921MHz
Observations
This microprocessor is targeted towards quick graphical computations, which can
instead be used for deep learning.
Discussion
Conclusion
The original project concept at the time of project proposition had a slight shift in focus.
Existing Solutions
The Problem
Importance of object detection
Algorithmic object detection
Detection of objects from an image
There are various cases in which automation of object detection as opposed to
having a human constantly observing footage is beneficial.
Quality control
Security
Analysis
AI based Object Detection
Production
In the field of security through digital cameras, object detection is an incredibly
useful tool for monitoring potential intruders onto a facility.
For facilities that utilise dozens, or even hundreds of cameras - object detection is a
very valuable tool that requires minimal human interaction, with high level of
certainty, and 24/7 attention.
Depending on the security requirements, lower security facilities may not require
hiring a person for constant monitoring of the security footage, and offers live
notifications for any unexpected activity observed.
During production of anything from farm produce, to electronic components -
consistent detection and rejection of items with damage is essential.
With use of object detection, flaws can be recognised incredibly efficiently, and this
information may be passed onto the production line, identifying exactly which item
was detected to have flaws, and be discarded automatically, without ever requiring
any human interaction.
If the detection is sophisiticated enough to be more reliable than a person, this
opens up new opportunities for the speed and efficiency of the production, as a
computer's computational power may be expanded, unlike a person.
Thorough inspection of potential objects on final products is a crucial part of many
fields of production.
Features that may've developed during the process of manufacture may be detected
on the final products.
This includes positive, negative, or purely analytical features.
Examples
Developments of a petri dish colony
PCB manufacturing error
Potential signs of disease on farms
Material production flaws
Algorithm based detection can be used to effectively identify very specific criteria,
which can be expressed as an analytical value, or a trend.
Specific color based properties.
Specific shapes by following line trends.
Must be coherent shapes, does not perform well with partial shapes.
Trees, cats, dogs, people, cars, etc.
Specific patterns.
Training using Deep Learning
Epoch
Inference
A complex combination of usually many millions of digital neurons, with analog
based values for each neuron, which result in a form of decision making based on a
massive combination of criteria, rather than individual pixels.
The process of tweaking the entire model based on the existing parameters and the
current output that it produces - by use of a loss function, randomness, and clever
technique, in hopes of improving the detection ability of the data the model was
trained upon.
These neurons work together to identify the incoming information and produce an
output that resembles what the network has learned during the training of the model.
Training is usually based off of a pre-trained model, that is trained on a big dataset.
Most pre-trained models are trained on the COCO dataset, which is publicly
available, and holds a vast amount of data. This kickstarts the model with data that it
can repurpose to use as a base.
Initially, the model parameters are set to a random state.
The best performing model is kept between the newly trained model, and the
previous best.
Many epochs are ran in order to polish the model as much as possible.
The output that it produces when fed input data, is of course, also random.
Epochs are ran on the model to train the model.
In object detection, inference is the utilisation of the model to process classifications
of objects on the image.
Classification
The process of using the steps provided in the model to identify objects from an
input image, through steps that depend on the architecture used.
The identified objects are marked with a bounding box, class they belong to, and
confidence in the prediction value.f
Because of how far ahead the YOLO architecture is when compared to most other
architectures, is utilised very commonly throughout any object detection projects.
Internal steps of the You Only Look Once Inference
The input image.
The input image is split into a S by S grid, S being 7 in this example.
Each cell predicts the bounding boxes and confidence values of each box.
These steps are repeated for each of the grid cells, until everything is identified.
All the identified bounding boxes.
And all the probabilities for each grid cell.
Each of the bounding boxes are checked for how much they cover of each
probability cell, and are "shaded" in the case of this example with those probabilities.
Finally, the bounding boxes are reduced using thresholds detections and NMS.
NMS - Non-maximum Suppression
A filtering technique used on predictions of object detectors.
Picks the smallest intersection of bounding boxes, according to their confidence
values.
REFERENCE https://web.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/
YOLO.pdf
This is the final output that YOLO provides: Bounding boxes with classification (In
this example, the classification is marked by a color. The real output is a string.), and
the confidence value (In this example, confidence is marked by the opacity of the
boxes. The real output is a number ranging from 0 to 1)
The project was intended to focus on a field that is more familiar, which was more on the
theme of a high precision tool, which requires a set rig to push beyond the limitations to
produce high quality results.
At some point, the focus has shifted towards a more flexible, but less precise concept with
the consideration of more generic use through mobile deployment.
As mobile deployment of object detection was not explored beforehand nor planned for,
this has put additional strain on the timeline.
Great results have been achieved in object detection.
The model has been train on on over 3000 images that were taken and labeled over the
course of several months.
At this rate, a significant increase in the dataset would be required for adequate results
outside of a set rig, which would require a significant amount of additional time investment
that is outside of the time constraints for this project.
This is of course is not a viable option in the provided time constraint, but should be
considered if the project were to be continued in the future.
Introduction of random rotation, with respect to the label position. Specified in a 0 to
360° range.
Introduction of random spots of blur. Specified in frequency and intensity.
Introduction of random scaling of the image, at a specified frequency and range of
scaling.
Simulates real life scenario of a different angle the feed is provided in.
Simulates real life scenario of change in focus, fog, and smudges on the camera
lens.
Simulates real life scenario of distance. Upclose objects will cover a far bigger pixel
area in an image than a far away one would, and should be trained against this to
prevent detection of only certain distance away objects.
Cutting and joining of images into new images.
Creates additional images from the existing dataset that appear unique, making
most of the existing dataset, with only a slight devalue in information stored in
regards to training.
Simulates bright objects interacting with the lens, ensuring the model does not get
confused about glare in real life scenarios.
Introduction of random glares, of specified frequency and intensity.
Introduction of random spots and smudges.
Simulates dust, dirt, and other particles that may be present in a real life scenario,
ensuring the model does not get confused by a partial coverage of an object.
Abstract
Acknowledgement
I am very grateful to Technological University Dublin for accepting me as a student,
and providing me with the opportunity to take on this project.
Declaration
The material contained in this assignment is the author’s original work, except where work
quoted is duly acknowledged in the text. No aspect of this assignment has been previously
submitted in any other unit or course.
IDE - Integrated Development Environment
A sophisticated text editor, designed specifically for code development in certain
languages.
Most IDEs today come with a high range of optional plugins that can be used to
further increase production speed, and reduce redundant tasks via automation of
said tasks.
A term often used in deep learning, which is the process of attempting to simulate
intelligence that is similar to that of a human, by the use of a computer, in order to
tackle issues that a standard style of computer operation either fails to, or performs
at incredibly slow rates.
Course
TU807
Year
4th
Title
Bachelor of Engineering (Honours) in Computer Engineering in Mobile Systems
Code
I’d like to acknowledge and express my gratitude towards my project supervisor,
Benjamin Toland – who has taken on me and my custom project and has provided
excellent guidance and feedback throughout the development of this project.
I am also very grateful for the existence and availability of search engines, the
primarily used for this project being Google. It is a valuable resource for research,
even though you should not take every search result at face value, and ensure at
least a few trusted sources agree with the findings.
I would also like to thank my wonderful old and new friends and peers that I was
able to meet thanks to my ability to attend TUD Blanchardstown Campus, and the
quiet spaces provided for us to further our education with minimal interruptions.
I am also grateful to my brother, Justas Bartnykas – who is also an engineer. I am
grateful to him for introducing me to Qt Creator, which is now my go-to IDE for
developement of C++ code for the past over 5 years now. I have also been provided
access to a fairly expensive camera that he owned, that allowed the project to begin
sooner than it otherwise could've.
I’d like to offer my sincere apology and thanks to anyone else I may have missed
that has contributed to this project in any way.
When we look at an image, we can immediately discern objects that are displayed,
without ever having to think about it. It is effortless.
We use a combination of past experiences and deduction to determine information
almost immediately.
This ability of ours as humans is thanks to millions of years of evolution, in a world
where not detecting a potential threat makes a difference between life and death. It
is in our subconscious nature to detect objects within a moment's notice.
The problem arises when we want to implement a computer to process object
detection for us.
A computer is built upon the most simple of arithmetic tasks, combined together from
an incredible amount of transistors that perform these tasks, into massive systems -
which operate entirely in digital binary.
A computer has no concept of learning from past experiences, nor any feelings that
may affect its decisions. Given the same instructions, it will produce the same
results on its first day of manufacture, and the last day of its operation (provided the
unit was not damaged in a way that it would yield unpredictable results).
Computers are designed to process computational tasks at complete precision and
consistency.
When a computer is exposed to an image in form of a pixel grid, all it truly sees is
numerical values that are assigned to each cell of the grid.
Computers have no concept of color, let alone real life objects.
The simple action of either moving or scaling an object completely changes the
arrangement of the pixels that represent this object.
To a person - this is no issue. It is obvious that this is still the same object. However,
to a computer - the data has just shifted around entirely.
Examples
A close up of a 220ohm resistor
An even further close up of the first band, where each pixel is beginning to show
The same band, rotated at an arbitrary angle
The width and height of the image have both changed due to this rotation.
The order of the pixels has changed drastically.
However, it will struggle heavily at detecting and identifying any generic object.
Examples
Examples
While this is a complex issue to tackle already, it only becomes more complex when
we want to detect something generic, such as a tree.
A tree has many properties that may differ, such as: presence, color, shape, and size
of leaves/needles, thickness, size, and color of trunk, branching styles, etc. While
still being effortlessly identifiable by an intelligent creature.
The dataset is separated into two categories - one on which the model is being
adjusted on, and another on which the model is being ran to determine the new
confidence values achieved by the alterations.
Both datasets must be labeled for fully automated evaluation.
Labeling is an essential part of training for Object Detection.
Labels are bounding boxes that determine what object is present where in the input
dataset, in order for the model to do its best to detect them.
Neural Networks
The networks are trained through deep learning.
Neural networks are an imitation of biological creatures intelligence, which is known
as Artificial Intelligence. AI is designed to be ran on a computer.
Description
It has the ability to be trained, which simulates past experience of biological
intelligence.
In the scope of this project, somewhere between 250 and 500 epochs will suffice.
The boost in results is limited to the set rig.
Author
B00125142 Violet Concordia
Supervisor
Benjamin Toland
Several of same type of component, all packaged in a single line.
Example: an SIP resistor, which features multiple resistors, all connected to a single
ground pin.
Observations
Calculations
Calculations
Gantt Chart as Table
Choice
Camera
Rig
Ring Light
Model
OSY-C100-4-12
Specifications
HDMI
USB
Framerate
30fps
Video
Image
MegaPixels
Pixel Size
1.55µm × 1.55µm
12
4k
1080p
Framerate
60fps
1080p
Framerate
30fps
Chosen mode of operation
USB connection, 1080p, 30fps.
This mode was chosen due to the high comptability range of a USB connection, and
both the resolution and framerate being more than sufficient for the purposes of the
project.
Specifications
Connection
AC power plug.
Dimensions
Height
Width
Length
22.6mm
95mm
59mm
65mm is the maximum diameter this ring light can be attached to.
Features
Adjustable intensity via a knob on the side.
There is a high variety of camera rigs available for purchase and for the purposes of
this project - as long as the rig has a surface and the camera has a convenient way
to be mounted - there is not much of a difference on which rig is chosen, provided it
is adequate in dimensions and is capable of steadily mounting the camera.
That is the reasoning for choosing a rig that is quite basic, but does the job well, and
was already available at home.
Dimensions
Base Width
Base Length
280mm
180mm
Choice Rationale
Choice rationale
For the purposes of object detection at the scale of this project, 1080p at 30 frames
per second is plenty of incoming data.
The quality of the output images has been tested to be sufficient.
While this camera is over-qualified for the project, as mentioned in the
Acknowledgement section, this camera was available at the start of the project, and
was chosen out of convenience of availability.
Height
175mm to 350mm (Adjustable arms)
Choice Rationale
This ring right was chosen due to being far more than sufficient in supplying
adequate lighting, and being reasonably cheap.
Cost
~250€
Cost
~15€
Cost
~40€
This report illustrates the development of a 4th year Computer Engineering Project,
which is designed to tackle the issue of computer-based detection of objects present
in any given image from a camera image/video feed.
While object detection is a very vast topic, the project focuses specifically on
electronic component detection and identification.
Specialisation to read additional information from the electronic components, such
as their rating values, or marking codes.
The focus audience of this project is engineers.
Hopes to assist people in becoming engineers by providing a useful tool for
identification, as well as curiosity to investigate both the electronic components, and
inner workings of the project. It also aims to be a valuable tool for existing engineers
by assisting identification of electronic components in bulk.
The project utilises Artificial Intelligence to achieve its goals, with high confidence
values.
To run AI, a model is required. The model must be trained from a dataset.
The dataset for the training of the model is gathered through the set rig, which is a
camera stand with a surface that the items are placed on, and the camera with a
ring light positioned directly above the placed items, for optimal and consistent
viewing angles and visibility.
The project is controlled through a GUI application that provides feedback on the
information gathered, and tools to interact with the device. Currently the GUI
application is deployed on a desktop machine, and the camera on a set rig.
The desktop receives the information from the camera through a USB connection.
The purpose of the set rig is to both provide an efficient environment for data
gathering purposes, and to constrain the amount of variation a tested-upon image
has exposure to, which drastically reduces the angles, lighting conditions, distance,
and background variations of the input images, which is highly beneficial for a higher
confidence value of the object detections.
For the ideal future which may be out of scope of the current time and resource
constraints
Thanks to the project being developed and running on C++, in addition to using the
YOLO architecture – the project has the opportunity to expand into a high
confidence, high efficiency, constantly-improving through user-submitted dataset
project, that is deployable on desktop computers, laptops, microprocessors, and
mobile (Android/iOS) devices.
Live Labeling
The ability to take a snapshot of the current frame, defining appropriate labels, and
saving this labeled snapshot for future training. All from inside the GUI.
Alternatively, taking snapshots of the GUI and saving them for later labeling.
It should be noted that the labels must be adequately labeled.
If mass deployed with the ability to live label and submit the images to a server for
further training, there must be a sophisticated method of ensuring that the submitted
data is 100% accurate.
If an image is mis-labeled, or an object is missed – this would cause mis-training, or
worse – de-training of the model, corrupting the dataset and setting the training
backwards rather than forwards.
Potential solutions
Manual inspection of submitted images, for the purpose of quality control, which is
crucial for the success of the model.
Submission of
Labeled images
Requires manual inspection.
Un-labeled images
Requires manual labeling.
Both of the solutions pose an issue of requiring manual labor from a person that is
not the user, and is adequately knowledgable of which data is sufficient both in
image quality, and labeling – which causes a linear demand of paid workers for the
submission flow rate.
In conclusion - while the project did not meet all the new goals - it did explore them in
detail.
The training process for the model was well-planned and executed, resulting in a well-tuned
model that is capable of meeting the criteria for a set-rig operation of the project.
Setup
OpenCV Extra Modules
Qt Creator
CMake
Version
3.16.3-1ubuntu1.20.04.1
Description
A free, open-source software, designed to automate building, testing, packaging, and
installing software, by using compiler-independent methods.
Platform
Cross-platform
CMake does not build - rather, it generates another system's build files.
The user is able to specify exactly which components they would like to be included, and
their properties.
The generated build files are then used to build from.
Required items
Installation
Linux
Terminal command to install cmake-gui
OpenCV
CUDA
Download the appropriate version for your machine
https://developer.nvidia.com/cuda-downloads?
target_os=Linux&target_arch=x86_64&Distribution=Debian&target_version=11&target_typ
e=deb_local
From a terminal in ~/opencv/
Let ~/opencv/ be our cmake directory
git clone https://github.com/opencv/opencv.git
git checkout 4.7.0
From a terminal in ~/opencv/
git clone https://github.com/opencv/opencv_contrib.git
git checkout 4.7.0
cd ./opencv
cd ./opencv_contrib
Clone the OpenCV source repository
Enter the repository directory
Checkout this specific version
Clone the OpenCV Extra Modules source repository
Enter the repository directory
Checkout this specific version
Download Link
Extract into ~/opencv/[filename]
This command may be used:
tar -xvf ./cudnn-linux-x86_64-8.6.0.163_cuda11-archive.tar.xz
Note: v argument is optional. V stands for verbose, which indicates to print out every file
being extracted.
Finally, the ~/opencv/ directory should have the following content
Working directory
We also need an empty ~/opencv/build directory for the output of CMake
Note:
Libraries
Generation of build files using CMake GUI
Open CMake
enter cmake-gui into the terminal
First we have to select the source and binaries paths
Source path should be ~/opencv/opencv
binaries path should be ~/opencv/build
Then click configure
Configure is highlighted in blue
This setup has been carried out on the previously mentioned Lubuntu 20.04 machine.
Steps may slightly vary on other Operating Systems.
A custom UI theme was used on these machines.
Press Configure
OpenCV
Neural network
Qt Creator
Extras
If any of the options were not available, simply press configure again and they should
appear.
Press Generate.
ENABLE_FAST_MATH
CUDA_FAST_MATH
OPENCV_ENABLE_NONFREE
WITH_PNG
WITH_FFMPEG
BUILD_opencv_dnn
Qt6_DIR
OPENCV_EXTRA_MODULES_PATH
WITH_GTK_2_X
WITH_GTK
Qt6*
WITH_QT
WITH_ONNX
CUDA
The default options of Unix Makefiles and Use default native compilers will most likely
suffice. Choose Finish.
Missing directories (follow pattern of others)
/opt/Qt/6.2.4/gcc_64/lib/cmake/Qt6
WITH_NGRAPH
WITH_INF_ENGINE
WITH_CUDNN
WITH_CUDA
CUDA_TOOLKIT_INCLUDE
CUDA_VERSION
CUDA_TOOLKIT_ROOT_DIR
CUDNN_INCLUDE_DIR
CUDNN_LIBRARY
OPENCV_DNN_CUDA
~/opencv/opencv_contrib-4.x/modules
Make sure expected version is found.
11.4 was chosen for this project.
~/opencv/cudnn-linux-x86_64-8.6.0.163_cuda11-archive/lib/libcudnn.so
~/opencv/cudnn-linux-x86_64-8.6.0.163_cuda11-archive/include
/usr/local/cuda/include
/usr/local/cuda
Flags setup
The reason for this is that some flags cause other flags to appear.
For example: WITH_CUDNN is required for CUDNN related settings.
Repeat until all the flags are as specified.
The ~/opencv/build directory will be populated with all the necessary build files.
Build
Navigate to ~/opencv/build from your terminal
nproc
Execute the comand
Returns how many cores the machine has
From ~/opencv/build, execute the command
make -j $(nproc)
The command then becomes make -j 8, if the machine had 8 cores.
Installation
sudo make install
Throughout the project, the machines presented were not consistent.
A personal laptop, and a personal desktop were used.
Performance-wise, this is insignificant, as both machines are running the same version of
Lubuntu 20.04, and both utilise a NVidia GPU that has CUDA cores.
This process took somewhere around 4 hours.
The duration of this process is heavily dependent on the specs of the machine.
OpenCV
Finally, OpenCV is now fully installed, and ready to be used.
Project
Qt Widgets Application
Project name and path options
Build System: CMake
Build Kits
For the dev environment and basic setup, we want to use the Desktop kit.
In the future, an Android kit may be necessary, to deploy the project on Android devices.
We are using CMake to generate build files once again, this time the terminal version.
Creation
Setup
CMakeLists.txt (project file) configuration - including the required libraries
#
CUDA_START
set
(CUDA_TOOLKIT_ROOT_DIR
"/usr/local/cuda-11.4"
)
find_package
(CUDA
11.4
REQUIRED)
set
(
CMAKE_CUDA_STANDARD
11)
set
(
CMAKE_CUDA_STANDARD_REQUIRED
ON
)
#
CUDA_END
#OPENCV_START
find_package
(OpenCV
REQUIRED)
include_directories
(
${
OpenCV_INCLUDE_DIRS
}
)
include_directories
(4th_year_project
PRIVATE
"/usr/local/include/opencv4"
)
target_link_libraries
(
4th_year_project
PRIVATE
${OpenCV_LIBS}
)
#OpenCV_END
The CUDA version matches the one compiled during OpenCV installation
Installation
Version
6.2.4
The Open Source version was used for this project.
We want the Qt Online Installer.
The system is running Linux - we need the matching version of the installer.
Online Installer - version selection
The UI design overview
Note: The design of the project is up to interpretation, and only key parts will be covered
briefly.
This project is open source, and the reader may clone the repository to run it from this point
on.
The rest of this section will no longer be a tutorial, but an overview of the project internals.
Systems
Inference
Post-processing
Camera
UI
Orientation
Optical Character Recognition
Confidence Threshold
Score Threshold
NMS Threshold
Detection Struct
classId
Internal logic, does not concern the user - used to determine the class name.
className
Name of the class (Resistor/Capacitor/etc...)
extra
Post processing information, appended to the name during display
confidence
How confident the inference is that this detection is indeed this class, and right here. Ranges
from 0 to 1, 0% to 100%.
color
Display color of this detection.
box
The location and size of the detection, represented as a box.
mat
A cropped out image from the original, from the position and size of the box.
Confidence requirement, under which the detection is rejected.
Score requirement, under which the detection is rejected.
NMS requirement, under which the detection is rejected.
Saturation adjustment
Utilises Histogram equalisation provided by OpenCV libraries to roughly adjust every
detection to a specific saturation value, in order to make post-processing more consistent.
Draws countours around a greyscale version of the component, in order to determine
orientation.
The images are automatically rotated to face forward.
The extra space that is caused by rotation is filled with pure black pixels.
Resistor Decoding Algorithm
Optimisation
The decoding algorithm method
The algorithm was designed on a separate project, and the values that were found to do the
job best were left as defaults.
Arguments
inputMat
The cropped out, prepared, and rotated input image of a resistor.
Example results
Final result of the algorithm, identifying 4 bands with colors: Yellow, Purple, Green, and
Gold.
Bulk calibration
Early bulk testing
Determined resistor values are displayed as the last numbers in the window name. The
algorithm achieved 100% success rate for this particular random batch test.
Bulk example
Before saturation adjustment
After saturation adjustment
Base algorithm testing
Link
https://www.qt.io/download
Boundaries
Thanks to the set shape of a resistor, the algorithm is able to narrow down on which area
the bands are present in.
This means there is no need to process outside of the predicted magenta bounds.
This reduces the processing power required to an average of 1/4th (difference between the
entire image area, and the insides of the magenta bounds.)
Glare destroyed information filtering
qreal middleCropBoundPos
Ranges 0 to 1
Controls how much of the vertical middle is cropped out. (Will be covered in further detail
under the Glare section.)
qreal horizontalBoundPos
Ranges 0 to 1
Controls the horizontal magenta bounds position
qreal verticalBoundPos
Ranges 0 to 1
Controls the vertical magenta bounds position
HSV Filter
Minimum/Maximum
Hue
Saturation
Value
Filters pixels inclusively/exclusively (depends if min or max is the higher value), replacing
them with 0, 0, 0 rgb (black)
bool display
True or False
Debug feature, visualises the internal workings of the algorithm. Used to provide the media
in this report.
Ranges 0 to 1
Ranges 0 to 1
Ranges 0 to 1
QString displayName
The name of the debug display, if it is enabled.
Overexaggerated vertical crop from the middle middle, for demonstration purposes
Since the information is destroyed by glare, there is no reason to process the white strip in
the middle.
Cropping out from the vertical middle leaves us with more information in the line
horizontally.
Result after HSV filter
Steps
Further preparation processing
The algorithm was originally designed on a separate work environment, with sliders to
control every key parameter.
Dataset
Over 1500 pictures of resistors were gathered from the rig itself, for the algorithm to be
evaluated against.
Resistor body and background filtering, using the HSV filter arguments parameters.
Before HSV filter
After HSV filter
The band extraction algorithm
Steps
Slight blurring to average out the bands colors
Before blur
After blur
Example
Before optimisation
After optimisation
Note: The algorithm is calibrated to the optimised version. It expects a specific ratio
between color and blank pixels in the entire row that is being tested.
Notice the lack of processing outside of the bounds
Average of every row's hue and count of non black (empty) pixels are obtained.
Initialisation
For every row top to bottom, ratio of non-zero pixels is compared to the width of the row
being processed.
If the count is above a certain treshold, the algorithm assumes this is a color band, and
enters band mode.
Once in band mode, the algorithm starts to keep track of the average hue of the band.
Otherwise, it continues to the next row, until the condition is met, or no more rows remain.
This is what determines the final color of the band.
Once the algorithm detects an insufficient ratio of color in the row, band mode ends.
If the tracked band height was more than a certain small threshold (Aims to prevent noise
counting as bands), pushes the average hue onto the Bands List, and advances a certain step
size forward, because there will be no band immediately after a band.
The steps loop, until no more rows remain.
Color determination
The final band colors identifier names are determined from a combination of the Hue,
Saturation, and Value of the colors.
For example, red/brown can only be within ~330-360 and ~0-30 range.
This can be further processed by analysing the saturation and value of the color to
determine if this was a red, or brown color.
Connection maintenance
Frame retrieval
The application starts with the connected flag being false. This flag determines if it should
try connecting to the camera.
While this flag is false, a QTimer will run, which is a separate thread that executes every x
amount of time.
This timer is responsible for attempting to connect to the camera, until it is connected.
If we are unable to communicate to the camera, this indicates it was disconnected, and the
connected flag is set to false again.
Once the connection is established, the connected flag is set to true, and the timer ends.
To replace the connection timer, a frame retrieval timer starts in its place.
The connection is maintained completely automatically.
A frame is requested from the camera every specified amount of time, depending on the
frame rate.
Each frame is sent off to processing, while another one is being retrieved asynchronously,
thanks to QTimer threads.
Zoom control
For ease of focusing the camera and inspection of components upclose.
Status bar for configuring what gets ran/displayed
Comes with a reset position to middle button.
Inference detections
Post processing
Export button
Exports the entire frame, both before and after inference.
Each detection is exported separately.
Live camera feed
Console and Inference/Post-processing information output boxes
Provide general information about the detections.
This can be achieved by navigating to an appropriate directory to hold the repository, and
entering the command:
git clone https://github.com/Harmonised7/ai_identifier.git
If you are using SSH instead of HPPS, the command is instead:
git clone git@github.com:Harmonised7/ai_identifier.git
This is the reason the gold band was not found in the unoptimised version.
Found optimal calibration
70% horizontal cut off (vertical magenta bounds)
25% vertical cut off (horizontal magenta bounds)
12.5% cut out from the vertical middle
Full Hue range
35% to 100% Saturation range
Full Value range
The brightness of the images now matches the others to a considerable degree.
This provides us with a more consistent workspace for the algorithm to work from.
Recreating these features may be considered recreating the wheel.
Threads
Simplified overview of the utilised threads.
Note that the frame updating thread utilises both CPU and GPU threads while running
inference and post-processing.
Some of the steps may appear slightly different on your machine.
Overall, YOLOv8 at the time of writing was not at a state in which it would be preferable to
utilise it in favor of YOLOv5.
YOLOv8 is still in developement and while the planned features will most certainly be
incredible - YOLOv5 was chosen due to reliability and availability of the existing features.
Android deployment
Qt Maintenance Tool
Android Studio
Setup a device
Qt Creator
Install the correct Android SDK
Setup kit
Android related libraries selected
Android
Ninja
Overall - all the initial criterias and some beyond have been achieved with great success.
However, post-processing came at a cost of a considerable amount of processing power -
has yielded very impressive results that could've only been hoped for.
A further boost in performance was instead achieved by transitioning to a bigger model.
This of course came at a cost performance, dropping from camera side hardware limitation,
to the laptop that the project was developed on limitation.
UI improvements
Contrast text
Post Processing
Post processing identifying colors
Horizontal text adjustment
Horizontal text adjustment: middle
Horizontal text adjustment: right side
Horizontal text adjustment: left side
Examples
The text position adjustment was designed to always fit text on the screen, without it
leaving the screen.
To achieve this - the horizontal width of the screen and box, position of the box on
the screen, and the display text width had to be considered.
It is designed for the text to start directly from the start of the component when the
component is all the way on the left, and to end at the end of the component when
the component is all the way to the right, with a smooth transition in-between.
The text creation was upgraded from plain color text, to a bright outline with a darker
color fill.
This resulted in ease of readability of the text under all background conditions.
Examples
Plain text
Contrast text
Used in precise inspection of the components and camera focus adjustment.
Examples
Zoom example 3
Zoom example 1
Zoom example 2
No zoom
The project was an overall great success both as a learning experience, and as a utility tool it
has set out to be.
Unfortunately, it has been concluded that even if the dataset has been expanded a few
times - mobile deployment with a live video feed would simply not be feasible due to the
lack of CUDA core availability on mobile devices, and the lack of a set rig environment that
the project has been designed for.
With more resources and advances in mobile technology - this project has the potential to
one day make it into the mobile device market.
An option that was heavily considered was to move the processing part onto the cloud to
compensate for the lack of computational power, however - the project's core design was to
provide live video feed processing as an assistant tool the user may use for live reference, as
they move the components around.
Reduction from live video on a screen, to manually taking pictures and waiting for the
results to come back would've interefered with the core idea to a point where it would be a
different project, but it was very insightful to research the options and has inspired several
future projects.
The post processing algorithms were carried out as initially conceptualised, and turned out
to be a great success!
The initial goals of the project were fulfilled with better than expected results for the
timeframe that was available.
Even in the most ideal of conditions - OCR has more often than not failed to provide
adequate results, while costing a significant amount of processing power.
OCR has proved itself to be rather unreliable. Furthermore - the testing was done while
facing the component upright.
While this is certainly achievable, it is a further hit to the processing power, because each
rotation of 0, 90, 180, and 270 degrees would require a test - increasing the total required
processing power to over 4 times of what simply running OCR on a prepared image would
cost.
It was been concluded that OCR is simply not something that is feasible to be utilised during
runtime for this project.
Perhaps with further developement in OCR efficiency, the project may expand to support
OCR in the future.
Some of the components may have their surface worn, while others are very reflective and
cause significant glare.
In a real life scenraio, the component will face any random direction, so it must first be
rotated in the exact direction to face upwards to be readable.
Cloud Machine
Specifications
Performance directly correlates to the cost.
Discussion
Workload
The inference and post-processing workload would be processed solely on the
cloud.
New constraints
Having a live, lossless video feed would be essential for adequate results.
This would put a huge strain on the required processing power on the user's device.
A compromise could be to replace live video feed processing with image
submissions instead.
However, even a mobile device would likely be able to complete processing a single
frame in the duration it would take to send the data on to the cloud, the cloud to
process it, and then receive the processed data back.
This approach in the end may not reduce the processing power significantly enough,
and may actually introduce latency, depending on the connection qualitty.
Because the processing would be done on the cloud, access to internet would
required.
Cost
Running a server costs a monthly fee that may be avoided by running the server
from home, but doing so would hinder the expandability of the project, most likely
resulting in dissatisfaction through latency for the users.
Post processing identifying ohm values
Created With
EdrawMind